StringBuilder Extension: IsQuotedBy

Discussion:

Brady Kelly

2008-02-13 23:33:18 UTC

Permalink

Sebastien Lambla

2008-02-13 23:43:37 UTC

Permalink

Brady Kelly

2008-02-13 23:54:03 UTC

Permalink

Peter Obiefuna

2008-02-13 23:56:38 UTC

Permalink

Brad,
This algorithm looks inexpensive enough to me. But I think that whitespace
padding (or non-printables) may mask the result. For that reason, I would
prefer a RegularExpression.Match.
P

--------------------------------------------------
From: "Brady Kelly" <***@CHASESOFTWARE.CO.ZA>
Sent: Wednesday, February 13, 2008 4:33 PM
To: <DOTNET-***@DISCUSS.DEVELOP.COM>
Subject: [DOTNET-CLR] StringBuilder Extension: IsQuotedBy

> I was playing around with avoiding, at all costs, creating surplus
> strings,
> and came up with the following method to see if a StringBuilder, that I
> use
> for each of a collection of string fields, is surrounded by a certain
> string, e.g. double quotes. I was just wondering if this is an efficiant
> way of doing it:
>
>
>
> public static bool IsQuotedBy(this StringBuilder sb, string
> quoteString)
>
> {
>
> // Check if the first characters match the quote string.
>
> for (int i = 0; i < quoteString.Length; i++)
>
> {
>
> if (sb[i] != quoteString[i])
>
> {
>
> return false;
>
> }
>
> }
>
>
>
> // Check if the last characters match the quote string.
>
> for (int i = sb.Length - quoteString.Length; i < sb.Length; i++
> )
>
> {
>
> int qsIndex = 0;
>
> if (sb[i] != quoteString[qsIndex++])
>
> {
>
> return false;
>
> }
>
> }
>
>
>
> return true;
>
> }
>
>
> ===================================
> This list is hosted by DevelopMentor® http://www.develop.com
>
> View archives and manage your subscription(s) at
> http://discuss.develop.com
>

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Shawn Wildermuth

2008-02-14 00:02:12 UTC

Permalink

Greg Young

2008-02-14 00:04:11 UTC

Permalink

For most of these types of things it will usually be one char that
people want to check (like a ") so I would probably overload that case
just to do a quick char check ...

Cheers,

Greg

On Feb 13, 2008 3:33 PM, Brady Kelly <***@chasesoftware.co.za> wrote:
> I was playing around with avoiding, at all costs, creating surplus strings,
> and came up with the following method to see if a StringBuilder, that I use
> for each of a collection of string fields, is surrounded by a certain
> string, e.g. double quotes. I was just wondering if this is an efficiant
> way of doing it:
>
>
>
> public static bool IsQuotedBy(this StringBuilder sb, string
> quoteString)
>
> {
>
> // Check if the first characters match the quote string.
>
> for (int i = 0; i < quoteString.Length; i++)
>
> {
>
> if (sb[i] != quoteString[i])
>
> {
>
> return false;
>
> }
>
> }
>
>
>
> // Check if the last characters match the quote string.
>
> for (int i = sb.Length - quoteString.Length; i < sb.Length; i++
> )
>
> {
>
> int qsIndex = 0;
>
> if (sb[i] != quoteString[qsIndex++])
>
> {
>
> return false;
>
> }
>
> }
>
>
>
> return true;
>
> }
>
>
> ===================================
> This list is hosted by DevelopMentor(R) http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com
>

--
Studying for the Turing test

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Frans Bouma

2008-02-14 07:44:34 UTC

Permalink

Brady Kelly

2008-02-14 07:58:28 UTC

Permalink

Daniel Petersson

2008-02-14 10:31:17 UTC

Permalink

Frans Bouma

2008-02-14 11:28:29 UTC

Permalink

Sébastien Lorion

2008-02-14 14:29:15 UTC

Permalink

On 2/14/08, Frans Bouma <***@xs4all.nl> wrote:
> Did you measure this? My regexp based parsers for UBB and our own DSL
> as well as my LR(1) parser based on a regexp-using tokenizer disagrees with
> you. Using compiled regexp's ofcourse (which is the default anyway in .net
> 2.0+ if I'm not mistaken, as regexp's are cached and compiled at first run).
>
> So unless you want to write your own NFA based tokenizer/scanner,
> which is fun but also a lot of work, regexp's can greatly help and aren't
> necessarily slower.

It is only one example, and a case where hand-coding a parser can be
done in a reasonable time, but my CSV reader still beats hand down any
other NFA/DFA based parser (15x vs regexp and 5x vs GoldParser, with a
parsing expr./grammar not supporting multiline fields which mine
does).

http://www.codeproject.com/KB/database/CsvReader.aspx
http://www.devincook.com/goldparser/

Sébastien

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Brady Kelly

2008-02-14 14:43:23 UTC

Permalink

> It is only one example, and a case where hand-coding a parser can be
> done in a reasonable time, but my CSV reader still beats hand down any
> other NFA/DFA based parser (15x vs regexp and 5x vs GoldParser, with a
> parsing expr./grammar not supporting multiline fields which mine
> does).
>
> http://www.codeproject.com/KB/database/CsvReader.aspx
> http://www.devincook.com/goldparser/
>
> Sébastien

At first glance yours does not support multi-character tokens, such as
separators. I think this brings us back to one of the original replies to
my OP, that I should optimise a case for the more common scenario of single
character tokens, which I believe is what your parser does very well.

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Booth, Bill

2008-07-17 15:35:49 UTC

Permalink

Stoyan Damov

2008-07-17 15:48:30 UTC

Permalink

When I reply to your e-mail, the mail is from me (stoyan.damov at
gmail dot com) but the sender is DOTNET-***@discuss.develop.com
resulting in "DOTNET-CLR on behalf of Stoyan". Is it clearer now?

On Thu, Jul 17, 2008 at 6:35 PM, Booth, Bill <***@panamericanlife.com> wrote:
> Does anyone know the difference between the "From" property and the
> "Sender" property in the MailMessage class? I just can't seem to find an
> answer to this question.
>
> Thanks for any help.
>
> Bill
>
> The information in this e-mail is confidential, may be legally privileged and is intended solely for the addressee. If you have received this e-mail in error, you are hereby notified that any use, distribution, or copying of this communication is strictly prohibited.
>
> ===================================
> This list is hosted by DevelopMentor(R) http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com
>

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Booth, Bill

2008-07-17 16:07:25 UTC

Permalink

Booth, Bill

2008-09-11 15:46:09 UTC

Permalink

Phil Sayers

2008-09-11 15:51:52 UTC

Permalink

Booth, Bill

2008-09-11 16:20:59 UTC

Permalink

Barry Kelly

2008-02-14 15:01:09 UTC

Permalink

Peter Obiefuna

2008-02-14 16:03:12 UTC

Permalink

>> Daniel said: developing complex parsers without regexp isn't for the
>> faint-hearted =)

Do people still do that? And why would anyone want to do that? If you ask me
it will amount to whipping your own home-grown char-snake-and-ladder
"alphabet state machine" AKA a regex engine.

Wouldn't every lexer have to do this to identify tokens? It seems like an
obligatory step for most really big parsing workflows.
P

--------------------------------------------------
From: "Daniel Petersson" <***@CEFALO.SE>
Sent: Thursday, February 14, 2008 3:31 AM
To: <DOTNET-***@DISCUSS.DEVELOP.COM>
Subject: Re: [DOTNET-CLR] StringBuilder Extension: IsQuotedBy

> perf, strings and loops
>
> I have developed a few .NET based parsers during the last few years and
> here are som fast comments:
>
> 1. avoid creating objects, all kinds of objects are "really expensive" to
> create, especially in loops. gc:s generally aren't an issue when parsing,
> unless parsing HUGE documents, a small parser will have completed before
> the gc kicks in.
>
> 2. avoid virtual calls and delegates; both are great for flexibility but
> slows your code down considerably when executed in tight parse-loops.
>
> 3. regexp really rocks if you are looking for flexibilty; in those cases
> where flexibility is more important then speed regexp is really the
> perfect match, but if you are working on a small, well-defined and really
> fast parser don't even think about regexp.
>
> (1), (2) and (3) all trade performance against readability and
> maintainability; I recomend regexp for all "slow" cases but if you really
> are looking for performance it isn't a good solution. If a parse case is
> "slow" or "fast" really depend on your development task, but be adviced,
> developing complex parsers without regexp isn't for the faint-hearted =)
>
> regards,
> Daniel
>
> ________________________________________
> From: Discussion of development on the .NET platform using any managed
> language [DOTNET-***@DISCUSS.DEVELOP.COM] On Behalf Of Brady Kelly
> [***@CHASESOFTWARE.CO.ZA]
> Sent: Thursday, February 14, 2008 8:58 AM
> To: DOTNET-***@DISCUSS.DEVELOP.COM
> Subject: Re: [DOTNET-CLR] StringBuilder Extension: IsQuotedBy
>
> Probably, but I always get uneasy creating strings in loops, and this was
> just the culmination of my paranoia, fuelled by an evil combination of
> caffeine and fatigue.
>
>>
>> Isn't a ToString() with a regexp more efficient?
>>
>> FB
>>
>> > I was playing around with avoiding, at all costs, creating surplus
>> strings,
>> > and came up with the following method to see if a StringBuilder, that
>> I use
>> > for each of a collection of string fields, is surrounded by a certain
>> > string, e.g. double quotes. I was just wondering if this is an
>> efficiant
>> > way of doing it:
>> >
>> >
>> >
>> > public static bool IsQuotedBy(this StringBuilder sb, string
>> > quoteString)
>> >
>> > {
>> >
>> > // Check if the first characters match the quote string.
>> >
>> > for (int i = 0; i < quoteString.Length; i++)
>> >
>> > {
>> >
>> > if (sb[i] != quoteString[i])
>> >
>> > {
>> >
>> > return false;
>> >
>> > }
>> >
>> > }
>> >
>> >
>> >
>> > // Check if the last characters match the quote string.
>> >
>> > for (int i = sb.Length - quoteString.Length; i <
>> sb.Length; i++
>> > )
>> >
>> > {
>> >
>> > int qsIndex = 0;
>> >
>> > if (sb[i] != quoteString[qsIndex++])
>> >
>> > {
>> >
>> > return false;
>> >
>> > }
>> >
>> > }
>> >
>> >
>> >
>> > return true;
>> >
>> > }
>
> ===================================
> This list is hosted by DevelopMentor? http://www.develop.com
>
> View archives and manage your subscription(s) at
> http://discuss.develop.com
>
> ===================================
> This list is hosted by DevelopMentor® http://www.develop.com
>
> View archives and manage your subscription(s) at
> http://discuss.develop.com
>

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Barry Kelly

2008-02-15 16:13:45 UTC

Permalink

Peter Obiefuna

2008-02-15 16:34:30 UTC

Permalink

> Any single given regex terminates in a single state, "matched". A
> tokenizer has multiple ending states, one for each token. This is the
> key difference between most third-party regex libraries and the
> requirements of people writing parsers for well-defined languages.
>
> -- Barry

I would expect a regex implementation to construct a FSM from it's
expression string. Meaning that a "single given regex" could terminate in
multiple states, each of which is a "matched" state. If it keeps hitting a
matched state until the input buffer is finished, then, the string
'qualifies'. That, in my mind, is the difference between a state engine and
a collation engine like strcomp (never mind that you can illustrate FSM
graphically by pointing to a final dot on paper). But I expect a Regex
implementation to create a unique FSM from every input signature.

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Barry Kelly

2008-02-15 17:05:56 UTC

Permalink

Frans Bouma

2008-02-15 17:39:55 UTC

Permalink

Peter Obiefuna

2008-02-15 17:55:11 UTC

Permalink

This is an illuminating generalization, Frans. However, my response is in
the context of whether to whip up a home-grown DFA to check if a string
inside a .NET StringBuilder contains a quote at it's end or not.
P

--------------------------------------------------
From: "Frans Bouma" <***@XS4ALL.NL>
Sent: Friday, February 15, 2008 10:39 AM
To: <DOTNET-***@DISCUSS.DEVELOP.COM>
Subject: Re: [DOTNET-CLR] StringBuilder Extension: IsQuotedBy

>> Peter Obiefuna <***@HOTMAIL.COM> wrote:
>> > If it keeps hitting a
>> > matched state until the input buffer is finished, then, the string
>> > 'qualifies'. That, in my mind, is the difference between a state
>> > engine
> and
>> > a collation engine like strcomp (never mind that you can illustrate FSM
>> > graphically by pointing to a final dot on paper). But I expect a Regex
>> > implementation to create a unique FSM from every input signature.
>>
>> I don't understand your last sentence. The FSM is constructed for the
>> regex, not for the sentence to be matched.
>>
>> Also, I don't see how this is relevant to why one wouldn't "still"
>> hand-write a DFA.
>
> I think it depends on your language. Take for example UBB syntaxis
> vs.
> a programming language. The first has tokens which are surrounded by
> markers.
> The second doesn't. THe first has whitespace + text outside the marked
> areas
> which is the same: ignore and see it as one token, the second doesn't, it
> has
> to consider every input character.
>
> The first can be done with a regexp tokenizer pretty easily and
> very
> quickly: one can find the tokens in the text without a lot of effort, just
> define the regexp per token. (an open source example I wrote some time ago
> is
> available here: http://www.llblgen.com/hnd) The second can't use that
> setup
> because every non-whitespace is input to tokenize. Using a regexp
> tokenizer
> would be too slow (as it's too inefficient). THe first is efficient as it
> only
> has to consider the marked areas, the rest is to ignore.
>
> So if you're writing a parser for a language where every
> non-whitespace is a token, you indeed need your own statemachine to
> tokenize
> the input.
>
> I think both have a point: regexp's already define NFA's interally
> for
> their expression, why not utilize those? Of course, if your language
> doesn't
> fit that setup, you need your own. (Aho Sethi Ullman to the rescue ;))
>
> FB
>
> ===================================
> This list is hosted by DevelopMentor® http://www.develop.com
>
> View archives and manage your subscription(s) at
> http://discuss.develop.com
>

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Barry Kelly

2008-02-15 19:41:16 UTC

Permalink

Brady Kelly

2008-02-16 07:34:41 UTC

Permalink

Barry Kelly

2008-02-16 09:28:32 UTC

Permalink

Peter Obiefuna

2008-02-15 17:49:18 UTC

Permalink

>> But I expect a Regex
>> implementation to create a unique FSM from every input signature.
> > P
>
> I don't understand your last sentence. The FSM is constructed for the
> regex, not for the sentence to be matched.
>

By input signature, I mean the signature of the machine to be created (ie:
the expression or regex).

> Also, I don't see how this is relevant to why one wouldn't "still"
> hand-write a DFA.
>
> -- Barry

The point is that a finite state machine is a DFA and is generated by every
Regex engine implementation from the first input tape (ie: the regex) and
then moves over the second tape (input string to evaluate it. The input
string passes if the machine gets to a finish state and there will be many
of them in most real cases). I don't agree that whipping up a new custom
parser is guaranteed to do the job faster until we see some real theoretical
analysis or empirical data. I also don't buy the claim that it's as trivial
a task as to require 1 hour of a developer's time (not to mention testing
time). Of course, the dragon book has a 100-line lexer (I hope we all
remember that enticing chapter).
I hope you now see the relevance Barry. I probably didn't make the
connection clear the first time.
P

--------------------------------------------------
From: "Barry Kelly" <***@GMAIL.COM>
Sent: Friday, February 15, 2008 10:05 AM
To: <DOTNET-***@DISCUSS.DEVELOP.COM>
Subject: Re: [DOTNET-CLR] StringBuilder Extension: IsQuotedBy

> Peter Obiefuna <***@HOTMAIL.COM> wrote:
>
>> > Any single given regex terminates in a single state, "matched". A
>> > tokenizer has multiple ending states, one for each token. This is the
>> > key difference between most third-party regex libraries and the
>> > requirements of people writing parsers for well-defined languages.
>> >
>> > -- Barry
>>
>> I would expect a regex implementation to construct a FSM from it's
>> expression string. Meaning that a "single given regex" could terminate in
>> multiple states, each of which is a "matched" state.
>
> Yes; but most third-party regex implementations don't distinguish
> between matched states in the API, so in effect, all the matched states
> are the same state. You'd need to extend a regex language to distinguish
> between them.
>
> In other words, for some regex R:
>
> R ::= 'a' | 'b' | 'c' .
>
> We'd usually construct:
>
> s0 - start state
> s0 -'a'-> s1
> s0 -'b'-> s1
> s0 -'c'-> s1
> s1 - finish state
>
> ... and have slight difficulty in finding out which of a, b, or c we
> saw.
>
>> If it keeps hitting a
>> matched state until the input buffer is finished, then, the string
>> 'qualifies'. That, in my mind, is the difference between a state engine
>> and
>> a collation engine like strcomp (never mind that you can illustrate FSM
>> graphically by pointing to a final dot on paper). But I expect a Regex
>> implementation to create a unique FSM from every input signature.
>
> I don't understand your last sentence. The FSM is constructed for the
> regex, not for the sentence to be matched.
>
> Also, I don't see how this is relevant to why one wouldn't "still"
> hand-write a DFA.
>
> -- Barry
>
> --
> http://barrkel.blogspot.com/
>
> ===================================
> This list is hosted by DevelopMentor® http://www.develop.com
>
> View archives and manage your subscription(s) at
> http://discuss.develop.com
>

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Sébastien Lorion

2008-02-15 19:20:13 UTC

Permalink

A much more appropriate tool for parsing complex languages would be a
LALR parser such as the one I mentionned before.

http://www.devincook.com/goldparser/

Having a grammar file clearly laid out and used to generate the
parsing code is much more maintainable than some contorted monster
regexp. As others have pointed out, you also cannot visit the parse
tree ...

Sébastien

On 2/15/08, Peter Obiefuna <***@hotmail.com> wrote:
> >> But I expect a Regex
> >> implementation to create a unique FSM from every input signature.
> > > P
> >
> > I don't understand your last sentence. The FSM is constructed for the
> > regex, not for the sentence to be matched.
> >
>
>
> By input signature, I mean the signature of the machine to be created (ie:
> the expression or regex).
>
>
>
> > Also, I don't see how this is relevant to why one wouldn't "still"
> > hand-write a DFA.
> >
> > -- Barry
>
>
>
> The point is that a finite state machine is a DFA and is generated by every
> Regex engine implementation from the first input tape (ie: the regex) and
> then moves over the second tape (input string to evaluate it. The input
> string passes if the machine gets to a finish state and there will be many
> of them in most real cases). I don't agree that whipping up a new custom
> parser is guaranteed to do the job faster until we see some real theoretical
> analysis or empirical data. I also don't buy the claim that it's as trivial
> a task as to require 1 hour of a developer's time (not to mention testing
> time). Of course, the dragon book has a 100-line lexer (I hope we all
> remember that enticing chapter).
> I hope you now see the relevance Barry. I probably didn't make the
> connection clear the first time.
> P
>
>
> --------------------------------------------------
> From: "Barry Kelly" <***@GMAIL.COM>
> Sent: Friday, February 15, 2008 10:05 AM
>
> To: <DOTNET-***@DISCUSS.DEVELOP.COM>
> Subject: Re: [DOTNET-CLR] StringBuilder Extension: IsQuotedBy
>
>
> > Peter Obiefuna <***@HOTMAIL.COM> wrote:
> >
> >> > Any single given regex terminates in a single state, "matched". A
> >> > tokenizer has multiple ending states, one for each token. This is the
> >> > key difference between most third-party regex libraries and the
> >> > requirements of people writing parsers for well-defined languages.
> >> >
> >> > -- Barry
> >>
> >> I would expect a regex implementation to construct a FSM from it's
> >> expression string. Meaning that a "single given regex" could terminate in
> >> multiple states, each of which is a "matched" state.
> >
> > Yes; but most third-party regex implementations don't distinguish
> > between matched states in the API, so in effect, all the matched states
> > are the same state. You'd need to extend a regex language to distinguish
> > between them.
> >
> > In other words, for some regex R:
> >
> > R ::= 'a' | 'b' | 'c' .
> >
> > We'd usually construct:
> >
> > s0 - start state
> > s0 -'a'-> s1
> > s0 -'b'-> s1
> > s0 -'c'-> s1
> > s1 - finish state
> >
> > ... and have slight difficulty in finding out which of a, b, or c we
> > saw.
> >
> >> If it keeps hitting a
> >> matched state until the input buffer is finished, then, the string
> >> 'qualifies'. That, in my mind, is the difference between a state engine
> >> and
> >> a collation engine like strcomp (never mind that you can illustrate FSM
> >> graphically by pointing to a final dot on paper). But I expect a Regex
> >> implementation to create a unique FSM from every input signature.
> >
> > I don't understand your last sentence. The FSM is constructed for the
> > regex, not for the sentence to be matched.
> >
> > Also, I don't see how this is relevant to why one wouldn't "still"
> > hand-write a DFA.
> >
> > -- Barry
> >
> > --
> > http://barrkel.blogspot.com/
> >
> > ===================================
> > This list is hosted by DevelopMentor(R) http://www.develop.com
> >
> > View archives and manage your subscription(s) at
> > http://discuss.develop.com
> >
>
> ===================================
> This list is hosted by DevelopMentor(R) http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com
>

--
Sébastien
www.sebastienlorion.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Barry Kelly

2008-02-15 19:36:13 UTC

Permalink

Per Bolmstedt

2008-02-14 13:34:09 UTC

Permalink

On Thu, 14 Feb 2008 12:28:29 +0100, Frans Bouma <***@XS4ALL.NL> wrote:

> creating objects is indeed a bit expensive in .NET,

I've heard the exact opposite. And I find this hard to relate to as a
general statement... could you put this in perspective? Is it more expensive
in .NET than on comparable platforms, or is it more expensive in .NET than
comparable implementations that do not create objects?

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Sébastien Lorion

2008-02-14 14:15:31 UTC

Permalink

Creating lots of short lived objects in a short time provoke many Gen1
which artificially promotes other objects to Gen2 or Gen3. You then
get what Rico Mariani calls a mid-life crisis.

http://blogs.msdn.com/ricom/archive/2003/12/04/41281.aspx

So allocating memory in .NET is fast, but if you abuse it, then you
create yourself a big problem.

Sébastien

On 2/14/08, Per Bolmstedt <***@ul7.info> wrote:
> On Thu, 14 Feb 2008 12:28:29 +0100, Frans Bouma <***@XS4ALL.NL> wrote:
>
> > creating objects is indeed a bit expensive in .NET,
>
>
> I've heard the exact opposite. And I find this hard to relate to as a
> general statement... could you put this in perspective? Is it more expensive
> in .NET than on comparable platforms, or is it more expensive in .NET than
> comparable implementations that do not create objects?
>
>
> ===================================
> This list is hosted by DevelopMentor(R) http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com
>

--
Sébastien
www.sebastienlorion.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Barry Kelly

2008-02-14 15:36:40 UTC

Permalink

Sébastien Lorion <***@GMAIL.COM> wrote:

> Creating lots of short lived objects in a short time provoke many Gen1
> which artificially promotes other objects to Gen2 or Gen3. You then
> get what Rico Mariani calls a mid-life crisis.

You have your gen indexes shifted up by one :)

Creating lots of *short-lived* objects is not usually a problem (ideally
you'll be doing actual work, not just allocating), as long as you don't
push gen1 objects into gen2. Gen1 should ideally only hold objects that
couldn't be collected while collecting gen0 simply because they're
currently in use, but will have died by the time a gen1 collection comes
around. Thus, unless you're (ab)using gen1 by virtue of keeping a bunch
of stuff around while doing all this allocation, the cheap creation
stays cheap.

In a bit more detail: if you have a call stack that, when simplified,
looks like this:

ProcessRequest()
{
stuff = MakeBigObjectGraph();
loop (lots_of_times)
{
MakeAndDiscardShortLivedObjects();
}
LengthyProcessing(); // more likely problem
Process(stuff);
// discard stuff
}

... then you may be in a bit of trouble, if the usage characteristics of
MyRootMethod don't give the GC a chance to tune gen1 size to be bigger
than the size of the thing returned by MakeBigObjectGraph +
(lots_of_times / gen0_collection_rate_in_loop_iters) *
average_objects_size_in_use_in_loop_body.

Given that most .NET code runs on servers, and most servers have a
predictable request/response pattern, the GC should be able to tune gen1
size appropriately. But measure, of course - both time in GC and with
profiler / windbg to check that wrong objects aren't being promoted and
that gen1 size is as expected.

The above breaks down if the ideal gen1 size would be "too big" (a
different problem - gen1 gets expensive, instead of gen2, and would
probably need redesiging the code) - i.e. MakeBigObjectGraph is too
large, or other processing code takes too much time. But hopefully it
can be seen here that the problem is keeping "stuff" alive too long (in
real time) or making it too big, not the number of short-lived objects.
If the objects truly are short-lived, collecting gen0 should deallocate
it almost completely every time (i.e. promote almost nothing to gen1),
thus making a gen1 collection quite rare, and therefore mid-life crisis
very rare. LengthyProcessing() is more likely a source of problems,
preventing the collection of "stuff".

> http://blogs.msdn.com/ricom/archive/2003/12/04/41281.aspx

-- Barry

--
http://barrkel.blogspot.com/

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Curt Hagenlocher

2008-02-14 15:38:25 UTC

Permalink

On Thu, Feb 14, 2008 at 7:36 AM, Barry Kelly <***@gmail.com> wrote:
> Sébastien Lorion <***@GMAIL.COM> wrote:
>
> > Creating lots of short lived objects in a short time provoke many Gen1
> > which artificially promotes other objects to Gen2 or Gen3. You then
> > get what Rico Mariani calls a mid-life crisis.
>
> You have your gen indexes shifted up by one :)

Sure sign of a VB programmer *wink*.

--
Curt Hagenlocher
***@hagenlocher.org

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Sébastien Lorion

2008-02-14 16:42:01 UTC

Permalink

Right, the "short lived" was misleading. I meant objects who are
normally not intended to stay around, but because of the way it was
coded/architected, they do stay around. That said, it's not because
allocation and gen0 collection are fast that it can be used as an
excuse for lazy coding. Allocating objects in a tight loop or methods
like GetHashCode() is still a big nono ...

Nice catch on gen1-2-3 :p I guess spending less time coding make you
go back counting like 99% of the world population ...

P.S. Barry vs Brady: so easy to mistake one for the other :)

Sébastien

On 2/14/08, Barry Kelly <***@gmail.com> wrote:
> Sébastien Lorion <***@GMAIL.COM> wrote:
>
> > Creating lots of short lived objects in a short time provoke many Gen1
> > which artificially promotes other objects to Gen2 or Gen3. You then
> > get what Rico Mariani calls a mid-life crisis.
>
>
> You have your gen indexes shifted up by one :)
>
> Creating lots of *short-lived* objects is not usually a problem (ideally
> you'll be doing actual work, not just allocating), as long as you don't
> push gen1 objects into gen2. Gen1 should ideally only hold objects that
> couldn't be collected while collecting gen0 simply because they're
> currently in use, but will have died by the time a gen1 collection comes
> around. Thus, unless you're (ab)using gen1 by virtue of keeping a bunch
> of stuff around while doing all this allocation, the cheap creation
> stays cheap.
>
> In a bit more detail: if you have a call stack that, when simplified,
> looks like this:
>
> ProcessRequest()
> {
> stuff = MakeBigObjectGraph();
> loop (lots_of_times)
> {
> MakeAndDiscardShortLivedObjects();
> }
> LengthyProcessing(); // more likely problem
> Process(stuff);
> // discard stuff
> }
>
> ... then you may be in a bit of trouble, if the usage characteristics of
> MyRootMethod don't give the GC a chance to tune gen1 size to be bigger
> than the size of the thing returned by MakeBigObjectGraph +
> (lots_of_times / gen0_collection_rate_in_loop_iters) *
> average_objects_size_in_use_in_loop_body.
>
> Given that most .NET code runs on servers, and most servers have a
> predictable request/response pattern, the GC should be able to tune gen1
> size appropriately. But measure, of course - both time in GC and with
> profiler / windbg to check that wrong objects aren't being promoted and
> that gen1 size is as expected.
>
> The above breaks down if the ideal gen1 size would be "too big" (a
> different problem - gen1 gets expensive, instead of gen2, and would
> probably need redesiging the code) - i.e. MakeBigObjectGraph is too
> large, or other processing code takes too much time. But hopefully it
> can be seen here that the problem is keeping "stuff" alive too long (in
> real time) or making it too big, not the number of short-lived objects.
> If the objects truly are short-lived, collecting gen0 should deallocate
> it almost completely every time (i.e. promote almost nothing to gen1),
> thus making a gen1 collection quite rare, and therefore mid-life crisis
> very rare. LengthyProcessing() is more likely a source of problems,
> preventing the collection of "stuff".
>
>
> > http://blogs.msdn.com/ricom/archive/2003/12/04/41281.aspx
>
>
> -- Barry
>
> --
> http://barrkel.blogspot.com/
>
> ===================================
>
> This list is hosted by DevelopMentor(R) http://www.develop.com
>
>
> View archives and manage your subscription(s) at http://discuss.develop.com
>

--
Sébastien
www.sebastienlorion.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Frans Bouma

2008-02-14 14:16:24 UTC

Permalink

Chris Tavares

2008-02-14 19:00:26 UTC

Permalink

Hugh Brown

2008-02-14 15:06:51 UTC

Permalink

Is your code the same as this:

public static bool IsQuotedBy(string sb, string quoteString)
{
return sb.StartsWith(quoteString) && sb.EndsWith(Reverse(quoteString));
}

public static bool Reverse(string s)
{
StringBuilder sb = new StringBuilder();
for (int i = s.Length - 1; i >= 0; i--)
sb.Append(s[i]);
return sb.ToString();
}

Brady Kelly <***@CHASESOFTWARE.CO.ZA> wrote: I was playing around with avoiding, at all costs, creating surplus strings,
and came up with the following method to see if a StringBuilder, that I use
for each of a collection of string fields, is surrounded by a certain
string, e.g. double quotes. I was just wondering if this is an efficiant
way of doing it:

public static bool IsQuotedBy(this StringBuilder sb, string
quoteString)

{

// Check if the first characters match the quote string.

for (int i = 0; i < quoteString.Length; i++)

{

if (sb[i] != quoteString[i])

{

return false;

}

}

// Check if the last characters match the quote string.

for (int i = sb.Length - quoteString.Length; i < sb.Length; i++
)

{

int qsIndex = 0;

if (sb[i] != quoteString[qsIndex++])

{

return false;

}

}

return true;

}

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Brady Kelly

2008-02-14 15:21:48 UTC

Permalink

Hugh Brown

2008-02-15 03:34:07 UTC

Permalink

Re: ...and StringBuilder does not have the methods you use.

You mean StartsWith() and EndsWith()? Notice that I changed from StringBuilder to String.

Brady Kelly <***@CHASESOFTWARE.CO.ZA> wrote: No, I don't reverse the quote token (an oversight), and StringBuilder does
not have the methods you use.

> -----Original Message-----
> Sent: 14 February 2008 05:07 PM
> To: DOTNET-***@DISCUSS.DEVELOP.COM
> Subject: Re: [DOTNET-CLR] StringBuilder Extension: IsQuotedBy
>
> Is your code the same as this:
>
> public static bool IsQuotedBy(string sb, string quoteString)
> {
> return sb.StartsWith(quoteString) &&
> sb.EndsWith(Reverse(quoteString));
> }
>
> public static bool Reverse(string s)
> {
> StringBuilder sb = new StringBuilder();
> for (int i = s.Length - 1; i >= 0; i--)
> sb.Append(s[i]);
> return sb.ToString();
> }

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Brady Kelly

2008-02-15 04:28:02 UTC

Permalink

Per Bolmstedt

2008-02-14 18:56:47 UTC

Permalink

On Thu, 14 Feb 2008 11:42:01 -0500, =?ISO-8859-1?Q?S=E9bastien_Lorion?=
<***@GMAIL.COM> wrote:

> Allocating objects in a tight loop or methods like GetHashCode()
> is still a big nono ...

Why? See Chris' recent post, for example. Repeating the mantra isn't really
helping. =]

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Sébastien Lorion

2008-02-15 04:51:43 UTC

Permalink

I did use the word "tight" ... 100 ns in a loop taking 500 ns is still
20%. If you can take the allocation out of the loop, you just got a
nice save. And as far as I know, there is no reason to create objects
in a hashing function.

You may say, this is for edge cases. Yes and no ... If object
allocation was free, there would be no StringBuilder in the first
place and it would not have a constructor asking for its initial size,
etc.

Sébastien

On 2/14/08, Per Bolmstedt <***@ul7.info> wrote:
> On Thu, 14 Feb 2008 11:42:01 -0500, =?ISO-8859-1?Q?S=E9bastien_Lorion?=
>
> <***@GMAIL.COM> wrote:
>
>
> > Allocating objects in a tight loop or methods like GetHashCode()
> > is still a big nono ...
>
>
> Why? See Chris' recent post, for example. Repeating the mantra isn't really
> helping. =]
>
>
> ===================================
> This list is hosted by DevelopMentor(R) http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com
>

--
Sébastien
www.sebastienlorion.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Frans Bouma

2008-02-15 07:43:19 UTC

Permalink

> You may say, this is for edge cases. Yes and no ... If object
> allocation was free, there would be no StringBuilder in the first
> place and it would not have a constructor asking for its initial size,
> etc.

That initial size is to prevent memory fragmentation during memcpy
actions when the buffer needs to be resized. I don't see a relevance with
object creation speed and that parameter. I think the main reason the string
builder is there is to avoid having lots of objects to collect. I file that
kind of action under 'object destruction', not 'creation', though if you see
that as one action (as creating an object means it also has to be collected at
some point), you have a point.

FB

>
> Sébastien
>
> On 2/14/08, Per Bolmstedt <***@ul7.info> wrote:
> > On Thu, 14 Feb 2008 11:42:01 -0500, =?ISO-8859-1?Q?S=E9bastien_Lorion?=
> >
> > <***@GMAIL.COM> wrote:
> >
> >
> > > Allocating objects in a tight loop or methods like GetHashCode()
> > > is still a big nono ...
> >
> >
> > Why? See Chris' recent post, for example. Repeating the mantra isn't
really
> > helping. =]
> >
> >
> > ===================================
> > This list is hosted by DevelopMentor(R) http://www.develop.com
> >
> > View archives and manage your subscription(s) at
http://discuss.develop.com
> >
>
>
> --
> Sébastien
> www.sebastienlorion.com
>
> ===================================
> This list is hosted by DevelopMentor® http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Jon Skeet

2008-02-15 08:38:37 UTC

Permalink

Frans Bouma wrote:
>> You may say, this is for edge cases. Yes and no ... If object
>> allocation was free, there would be no StringBuilder in the first
>> place and it would not have a constructor asking for its initial size,
>> etc.
>
> That initial size is to prevent memory fragmentation during memcpy
> actions when the buffer needs to be resized. I don't see a relevance with
> object creation speed and that parameter. I think the main reason the string
> builder is there is to avoid having lots of objects to collect. I file that
> kind of action under 'object destruction', not 'creation', though if you see
> that as one action (as creating an object means it also has to be collected at
> some point), you have a point.

Not just objects to collect - but data to copy.

Suppose object creation and memory allocation were both completely free
- but that you were concatenating a million single character strings
together, one at a time. You'd still end up copying 1000 billion bytes.
I suspect that allocation (a pointer bump) is cheaper than the copying
by the time the strings get reasonably large.

Likewise, giving StringBuilder an appropriate initial capacity doesn't
just avoid memory fragmentation but also extra copying.

Jon

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Daniel Petersson

2008-02-15 08:35:39 UTC

Permalink

object-allocation in general on the .NET platform is really-really fast BUT if you do it in a tight parse-loop then it is expensive, i think that Sébastien's example with a 500nS loop is proof enough that object allocation may hurt performance.

Please remeber that, in my first post, I clearly stated that the choice (regex vs hand-coded) depended on the develment task, sadly a lot people didn't notice this and therefore a lot of post are slightly or completely of topic. (Comparing object allocation in tight parse-loops isn't even comparable to building entity objects from database queries; in the later case the object allocation speed is barely noticable)

FB: yes the initial size of the StringBuilder ctor prevents mem-fragmentation BUT allocating the right ammount from the start never hurts performance, you save a lot of "memcpy" operations due to the reduced need to grow the internal buffer.

Call-perf: even though the differences are barely measurable there are performance differences between the different call-types, here they are ordered fastest to slowest.

1. static, static calls are faster for a simple reason; you don't need to push the "this" vairable onto the stack
2. instance, normal instance calls are almost as fast as static calls but they have the "this" overhead
3. delegate and virtual, they are slightly slower since they are indirect.
In most cases the selected method doesn't impact performance but when you really-really need that extra bit of perf you have to understand the implications of the different call-types.

regards,
Daniel

________________________________________
From: Discussion of development on the .NET platform using any managed language [DOTNET-***@DISCUSS.DEVELOP.COM] On Behalf Of Frans Bouma [***@XS4ALL.NL]
Sent: Friday, February 15, 2008 8:43 AM
To: DOTNET-***@DISCUSS.DEVELOP.COM
Subject: Re: [DOTNET-CLR] StringBuilder Extension: IsQuotedBy

> You may say, this is for edge cases. Yes and no ... If object
> allocation was free, there would be no StringBuilder in the first
> place and it would not have a constructor asking for its initial size,
> etc.

That initial size is to prevent memory fragmentation during memcpy
actions when the buffer needs to be resized. I don't see a relevance with
object creation speed and that parameter. I think the main reason the string
builder is there is to avoid having lots of objects to collect. I file that
kind of action under 'object destruction', not 'creation', though if you see
that as one action (as creating an object means it also has to be collected at
some point), you have a point.

FB

>
> Sébastien
>
> On 2/14/08, Per Bolmstedt <***@ul7.info> wrote:
> > On Thu, 14 Feb 2008 11:42:01 -0500, =?ISO-8859-1?Q?S=E9bastien_Lorion?=
> >
> > <***@GMAIL.COM> wrote:
> >
> >
> > > Allocating objects in a tight loop or methods like GetHashCode()
> > > is still a big nono ...
> >
> >
> > Why? See Chris' recent post, for example. Repeating the mantra isn't
really
> > helping. =]
> >
> >
> > ===================================
> > This list is hosted by DevelopMentor(R) http://www.develop.com
> >
> > View archives and manage your subscription(s) at
http://discuss.develop.com
> >
>
>
> --
> Sébastien
> www.sebastienlorion.com
>
> ===================================
> This list is hosted by DevelopMentor® http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Sébastien Lorion

2008-02-15 15:11:09 UTC

Permalink

On 2/15/08, Frans Bouma <***@xs4all.nl> wrote:
> That initial size is to prevent memory fragmentation during memcpy
> actions when the buffer needs to be resized. I don't see a relevance with
> object creation speed and that parameter. I think the main reason the string
> builder is there is to avoid having lots of objects to collect. I file that
> kind of action under 'object destruction', not 'creation', though if you see
> that as one action (as creating an object means it also has to be collected at
> some point), you have a point.
>
> FB

Yes, I see that as one action because one cannot go without the other.
But sure, StringBuilder has more than one benefits as you and others
pointed out.

Sébastien

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Peter Obiefuna

2008-02-15 16:59:23 UTC

Permalink

> Yes, I see that as one action because one cannot go without the other.
> But sure, StringBuilder has more than one benefits as you and others
> pointed out.
>
> Sébastien

I am struggling with seeing object creation and destruction as "one action"
... and then speculate on overall performance based on that 'seeing'. GC
work is not happening on my worker thread. It may be important in the grand
scheme but that's how far am willing to go.
I interpreted the original poster's use of StringBuilder as a 'given'. I
imagine he's in the middle of this justifiable 'string-building' exercise
and then comes upon a decision to look for delimiters. Arguing that creating
StringBuilder for the sake of creating a class would be out of scope and may
obfuscate the original problem.

A perhaps strident note I'd rather chip in here is that programmers who
target .NET CLR should focus on writing maintainable code as a major
architectural ideal. For that reason, creating a few more properly
refactored classes with a single-responsibility concept, in my opinion, is
of higher value than than shaving off a million nanoseconds from one task.
Putting this in perspective, these kinds of apps hop around networks quite
much and there's much to gain from focusing on maintainable code that lays
more emphasis on optimizing the way it moves around the network.
P

--------------------------------------------------
From: "Sébastien Lorion" <***@GMAIL.COM>
Sent: Friday, February 15, 2008 8:11 AM
To: <DOTNET-***@DISCUSS.DEVELOP.COM>
Subject: Re: [DOTNET-CLR] StringBuilder Extension: IsQuotedBy

> On 2/15/08, Frans Bouma <***@xs4all.nl> wrote:
>> That initial size is to prevent memory fragmentation during
>> memcpy
>> actions when the buffer needs to be resized. I don't see a relevance
>> with
>> object creation speed and that parameter. I think the main reason the
>> string
>> builder is there is to avoid having lots of objects to collect. I file
>> that
>> kind of action under 'object destruction', not 'creation', though if you
>> see
>> that as one action (as creating an object means it also has to be
>> collected at
>> some point), you have a point.
>>
>> FB
>
> Yes, I see that as one action because one cannot go without the other.
> But sure, StringBuilder has more than one benefits as you and others
> pointed out.
>
> Sébastien
>
> ===================================
> This list is hosted by DevelopMentor® http://www.develop.com
>
> View archives and manage your subscription(s) at
> http://discuss.develop.com
>

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Sébastien Lorion

2008-02-15 19:10:40 UTC

Permalink

GC is taking cpu time, being on another thread does not change that.
The time spent collecting/compacting is less time spent doing real
work. If GC would spend 1 min for each gen0 collection, you would sure
see this as part of the cost.

About your second point, please compare below 2 simple implementations
of IsQuotedBy, both maintainable. The first one is around 10x faster.
That apparently insignificant change can make the difference between a
slow as molasses parser and a speedy one.

using System;
using System.Diagnostics;
using System.Text;

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
const int IterationCount = 1000000;

StringBuilder value = new StringBuilder("@asdf@");

Stopwatch timer = new Stopwatch();

timer.Start();
for (int i = 0; i < IterationCount; i++)
{
IsQuotedBy(value, "@");
}
timer.Stop();
Console.WriteLine(timer.ElapsedTicks);

timer.Start();
for (int i = 0; i < IterationCount; i++)
{
IsQuotedByWithAllocation(value, "@");
}
timer.Stop();
Console.WriteLine(timer.ElapsedTicks);

Console.ReadKey();
}

static bool IsQuotedBy(StringBuilder input, string quote)
{
if (input.Length < quote.Length * 2)
return false;

for (int i = 0; i < quote.Length; i++)
{
if (input[i] != quote[i])
return false;
}

for (int i = quote.Length - 1; i >= 0; i--)
{
if (input[input.Length - i - 1] != quote[i])
return false;
}

return true;
}

static bool IsQuotedByWithAllocation(StringBuilder input, string quote)
{
if (input.Length < quote.Length * 2)
return false;

for (int i = 0; i < quote.Length; i++)
{
if (input[i] != quote[i])
return false;
}

string reverse = Reverse(quote);

for (int i = 0; i < reverse.Length; i++)
{
if (input[input.Length - i - 1] != reverse[i])
return false;
}

return true;
}

static string Reverse(string input)
{
StringBuilder reversed = new StringBuilder(input.Length);

for (int i = input.Length - 1; i >= 0; i--)
reversed.Append(input[i]);

return reversed.ToString();
}
}
}

On 2/15/08, Peter Obiefuna <***@hotmail.com> wrote:
> > Yes, I see that as one action because one cannot go without the other.
> > But sure, StringBuilder has more than one benefits as you and others
> > pointed out.
> >
> > Sébastien
>
>
>
> I am struggling with seeing object creation and destruction as "one action"
> ... and then speculate on overall performance based on that 'seeing'. GC
> work is not happening on my worker thread. It may be important in the grand
> scheme but that's how far am willing to go.
> I interpreted the original poster's use of StringBuilder as a 'given'. I
> imagine he's in the middle of this justifiable 'string-building' exercise
> and then comes upon a decision to look for delimiters. Arguing that creating
> StringBuilder for the sake of creating a class would be out of scope and may
> obfuscate the original problem.
>
> A perhaps strident note I'd rather chip in here is that programmers who
> target .NET CLR should focus on writing maintainable code as a major
> architectural ideal. For that reason, creating a few more properly
> refactored classes with a single-responsibility concept, in my opinion, is
> of higher value than than shaving off a million nanoseconds from one task.
> Putting this in perspective, these kinds of apps hop around networks quite
> much and there's much to gain from focusing on maintainable code that lays
> more emphasis on optimizing the way it moves around the network.
> P
>
> --------------------------------------------------
> From: "Sébastien Lorion" <***@GMAIL.COM>
> Sent: Friday, February 15, 2008 8:11 AM
>
> To: <DOTNET-***@DISCUSS.DEVELOP.COM>
> Subject: Re: [DOTNET-CLR] StringBuilder Extension: IsQuotedBy
>
>
> > On 2/15/08, Frans Bouma <***@xs4all.nl> wrote:
> >> That initial size is to prevent memory fragmentation during
> >> memcpy
> >> actions when the buffer needs to be resized. I don't see a relevance
> >> with
> >> object creation speed and that parameter. I think the main reason the
> >> string
> >> builder is there is to avoid having lots of objects to collect. I file
> >> that
> >> kind of action under 'object destruction', not 'creation', though if you
> >> see
> >> that as one action (as creating an object means it also has to be
> >> collected at
> >> some point), you have a point.
> >>
> >> FB
> >
> > Yes, I see that as one action because one cannot go without the other.
> > But sure, StringBuilder has more than one benefits as you and others
> > pointed out.
> >
> > Sébastien
> >
>
> > ===================================
> > This list is hosted by DevelopMentor(R) http://www.develop.com
> >
> > View archives and manage your subscription(s) at
> > http://discuss.develop.com
> >
>
> ===================================
> This list is hosted by DevelopMentor(R) http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com
>

--
Sébastien
www.sebastienlorion.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Greg Young

2008-02-15 19:16:24 UTC

Permalink

Getting back to my original comment (which applies greatly here) ...

IsQuotedBy needs to be special cased for a single char ... Most of the
time it will be called with only 1 char (think quotes :)) and doing a
simpler ...

return string[0] == char && string[length - 1] == char

will be orders of magnitude faster ...

Cheers,

Greg

On Fri, Feb 15, 2008 at 11:10 AM, Sébastien Lorion
<***@gmail.com> wrote:
> GC is taking cpu time, being on another thread does not change that.
> The time spent collecting/compacting is less time spent doing real
> work. If GC would spend 1 min for each gen0 collection, you would sure
> see this as part of the cost.
>
> About your second point, please compare below 2 simple implementations
> of IsQuotedBy, both maintainable. The first one is around 10x faster.
> That apparently insignificant change can make the difference between a
> slow as molasses parser and a speedy one.
>
> using System;
> using System.Diagnostics;
> using System.Text;
>
> namespace ConsoleApplication1
> {
> class Program
> {
> static void Main(string[] args)
> {
> const int IterationCount = 1000000;
>
> StringBuilder value = new StringBuilder("@asdf@");
>
> Stopwatch timer = new Stopwatch();
>
> timer.Start();
> for (int i = 0; i < IterationCount; i++)
> {
> IsQuotedBy(value, "@");
> }
> timer.Stop();
> Console.WriteLine(timer.ElapsedTicks);
>
> timer.Start();
> for (int i = 0; i < IterationCount; i++)
> {
> IsQuotedByWithAllocation(value, "@");
> }
> timer.Stop();
> Console.WriteLine(timer.ElapsedTicks);
>
> Console.ReadKey();
> }
>
> static bool IsQuotedBy(StringBuilder input, string quote)
> {
> if (input.Length < quote.Length * 2)
> return false;
>
> for (int i = 0; i < quote.Length; i++)
> {
> if (input[i] != quote[i])
> return false;
> }
>
> for (int i = quote.Length - 1; i >= 0; i--)
> {
> if (input[input.Length - i - 1] != quote[i])
> return false;
> }
>
> return true;
> }
>
> static bool IsQuotedByWithAllocation(StringBuilder input, string quote)
> {
> if (input.Length < quote.Length * 2)
> return false;
>
> for (int i = 0; i < quote.Length; i++)
> {
> if (input[i] != quote[i])
> return false;
> }
>
> string reverse = Reverse(quote);
>
> for (int i = 0; i < reverse.Length; i++)
> {
> if (input[input.Length - i - 1] != reverse[i])
> return false;
> }
>
> return true;
> }
>
> static string Reverse(string input)
> {
> StringBuilder reversed = new StringBuilder(input.Length);
>
> for (int i = input.Length - 1; i >= 0; i--)
> reversed.Append(input[i]);
>
> return reversed.ToString();
>
>
> }
> }
> }
>
>
> On 2/15/08, Peter Obiefuna <***@hotmail.com> wrote:
> > > Yes, I see that as one action because one cannot go without the other.
> > > But sure, StringBuilder has more than one benefits as you and others
> > > pointed out.
> > >
> > > Sébastien
> >
> >
> >
> > I am struggling with seeing object creation and destruction as "one action"
> > ... and then speculate on overall performance based on that 'seeing'. GC
> > work is not happening on my worker thread. It may be important in the grand
> > scheme but that's how far am willing to go.
> > I interpreted the original poster's use of StringBuilder as a 'given'. I
> > imagine he's in the middle of this justifiable 'string-building' exercise
> > and then comes upon a decision to look for delimiters. Arguing that creating
> > StringBuilder for the sake of creating a class would be out of scope and may
> > obfuscate the original problem.
> >
> > A perhaps strident note I'd rather chip in here is that programmers who
> > target .NET CLR should focus on writing maintainable code as a major
> > architectural ideal. For that reason, creating a few more properly
> > refactored classes with a single-responsibility concept, in my opinion, is
> > of higher value than than shaving off a million nanoseconds from one task.
> > Putting this in perspective, these kinds of apps hop around networks quite
> > much and there's much to gain from focusing on maintainable code that lays
> > more emphasis on optimizing the way it moves around the network.
> > P
> >
> > --------------------------------------------------
> > From: "Sébastien Lorion" <***@GMAIL.COM>
> > Sent: Friday, February 15, 2008 8:11 AM
> >
> > To: <DOTNET-***@DISCUSS.DEVELOP.COM>
> > Subject: Re: [DOTNET-CLR] StringBuilder Extension: IsQuotedBy
> >
> >
> > > On 2/15/08, Frans Bouma <***@xs4all.nl> wrote:
> > >> That initial size is to prevent memory fragmentation during
> > >> memcpy
> > >> actions when the buffer needs to be resized. I don't see a relevance
> > >> with
> > >> object creation speed and that parameter. I think the main reason the
> > >> string
> > >> builder is there is to avoid having lots of objects to collect. I file
> > >> that
> > >> kind of action under 'object destruction', not 'creation', though if you
> > >> see
> > >> that as one action (as creating an object means it also has to be
> > >> collected at
> > >> some point), you have a point.
> > >>
> > >> FB
> > >
> > > Yes, I see that as one action because one cannot go without the other.
> > > But sure, StringBuilder has more than one benefits as you and others
> > > pointed out.
> > >
> > > Sébastien
> > >
> >
> > > ===================================
> > > This list is hosted by DevelopMentor(R) http://www.develop.com
>
> > >
> > > View archives and manage your subscription(s) at
> > > http://discuss.develop.com
> > >
> >
> > ===================================
> > This list is hosted by DevelopMentor(R) http://www.develop.com
>
> >
> > View archives and manage your subscription(s) at http://discuss.develop.com
> >
>
>
> --
> Sébastien
> www.sebastienlorion.com
>
>
>
> ===================================
> This list is hosted by DevelopMentor(R) http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com
>

--
Studying for the Turing test

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Sébastien Lorion

2008-02-15 19:24:47 UTC

Permalink

Sure ... I was conforming to original signatures. With one char,
reversing also becomes irrelevant.

About that, Brady, I never ever got the demand to support a multi-char
delimiter and I have never saw that in other CSV-like parsers.

Sébastien

On 2/15/08, Greg Young <***@gmail.com> wrote:
> Getting back to my original comment (which applies greatly here) ...
>
> IsQuotedBy needs to be special cased for a single char ... Most of the
> time it will be called with only 1 char (think quotes :)) and doing a
> simpler ...
>
> return string[0] == char && string[length - 1] == char
>
> will be orders of magnitude faster ...
>
>
> Cheers,
>
> Greg
>
>
> On Fri, Feb 15, 2008 at 11:10 AM, Sébastien Lorion
> <***@gmail.com> wrote:
> > GC is taking cpu time, being on another thread does not change that.
> > The time spent collecting/compacting is less time spent doing real
> > work. If GC would spend 1 min for each gen0 collection, you would sure
> > see this as part of the cost.
> >
> > About your second point, please compare below 2 simple implementations
> > of IsQuotedBy, both maintainable. The first one is around 10x faster.
> > That apparently insignificant change can make the difference between a
> > slow as molasses parser and a speedy one.
> >
> > using System;
> > using System.Diagnostics;
> > using System.Text;
> >
> > namespace ConsoleApplication1
> > {
> > class Program
> > {
> > static void Main(string[] args)
> > {
> > const int IterationCount = 1000000;
> >
> > StringBuilder value = new StringBuilder("@asdf@");
> >
> > Stopwatch timer = new Stopwatch();
> >
> > timer.Start();
> > for (int i = 0; i < IterationCount; i++)
> > {
> > IsQuotedBy(value, "@");
> > }
> > timer.Stop();
> > Console.WriteLine(timer.ElapsedTicks);
> >
> > timer.Start();
> > for (int i = 0; i < IterationCount; i++)
> > {
> > IsQuotedByWithAllocation(value, "@");
> > }
> > timer.Stop();
> > Console.WriteLine(timer.ElapsedTicks);
> >
> > Console.ReadKey();
> > }
> >
> > static bool IsQuotedBy(StringBuilder input, string quote)
> > {
> > if (input.Length < quote.Length * 2)
> > return false;
> >
> > for (int i = 0; i < quote.Length; i++)
> > {
> > if (input[i] != quote[i])
> > return false;
> > }
> >
> > for (int i = quote.Length - 1; i >= 0; i--)
> > {
> > if (input[input.Length - i - 1] != quote[i])
> > return false;
> > }
> >
> > return true;
> > }
> >
> > static bool IsQuotedByWithAllocation(StringBuilder input, string quote)
> > {
> > if (input.Length < quote.Length * 2)
> > return false;
> >
> > for (int i = 0; i < quote.Length; i++)
> > {
> > if (input[i] != quote[i])
> > return false;
> > }
> >
> > string reverse = Reverse(quote);
> >
> > for (int i = 0; i < reverse.Length; i++)
> > {
> > if (input[input.Length - i - 1] != reverse[i])
> > return false;
> > }
> >
> > return true;
> > }
> >
> > static string Reverse(string input)
> > {
> > StringBuilder reversed = new StringBuilder(input.Length);
> >
> > for (int i = input.Length - 1; i >= 0; i--)
> > reversed.Append(input[i]);
> >
> > return reversed.ToString();
> >
> >
> > }
> > }
> > }
> >
> >
> > On 2/15/08, Peter Obiefuna <***@hotmail.com> wrote:
> > > > Yes, I see that as one action because one cannot go without the other.
> > > > But sure, StringBuilder has more than one benefits as you and others
> > > > pointed out.
> > > >
> > > > Sébastien
> > >
> > >
> > >
> > > I am struggling with seeing object creation and destruction as "one action"
> > > ... and then speculate on overall performance based on that 'seeing'. GC
> > > work is not happening on my worker thread. It may be important in the grand
> > > scheme but that's how far am willing to go.
> > > I interpreted the original poster's use of StringBuilder as a 'given'. I
> > > imagine he's in the middle of this justifiable 'string-building' exercise
> > > and then comes upon a decision to look for delimiters. Arguing that creating
> > > StringBuilder for the sake of creating a class would be out of scope and may
> > > obfuscate the original problem.
> > >
> > > A perhaps strident note I'd rather chip in here is that programmers who
> > > target .NET CLR should focus on writing maintainable code as a major
> > > architectural ideal. For that reason, creating a few more properly
> > > refactored classes with a single-responsibility concept, in my opinion, is
> > > of higher value than than shaving off a million nanoseconds from one task.
> > > Putting this in perspective, these kinds of apps hop around networks quite
> > > much and there's much to gain from focusing on maintainable code that lays
> > > more emphasis on optimizing the way it moves around the network.
> > > P
> > >
> > > --------------------------------------------------
> > > From: "Sébastien Lorion" <***@GMAIL.COM>
> > > Sent: Friday, February 15, 2008 8:11 AM
> > >
> > > To: <DOTNET-***@DISCUSS.DEVELOP.COM>
> > > Subject: Re: [DOTNET-CLR] StringBuilder Extension: IsQuotedBy
> > >
> > >
> > > > On 2/15/08, Frans Bouma <***@xs4all.nl> wrote:
> > > >> That initial size is to prevent memory fragmentation during
> > > >> memcpy
> > > >> actions when the buffer needs to be resized. I don't see a relevance
> > > >> with
> > > >> object creation speed and that parameter. I think the main reason the
> > > >> string
> > > >> builder is there is to avoid having lots of objects to collect. I file
> > > >> that
> > > >> kind of action under 'object destruction', not 'creation', though if you
> > > >> see
> > > >> that as one action (as creating an object means it also has to be
> > > >> collected at
> > > >> some point), you have a point.
> > > >>
> > > >> FB
> > > >
> > > > Yes, I see that as one action because one cannot go without the other.
> > > > But sure, StringBuilder has more than one benefits as you and others
> > > > pointed out.
> > > >
> > > > Sébastien
> > > >
> > >
> > > > ===================================
> > > > This list is hosted by DevelopMentor(R) http://www.develop.com
> >
> > > >
> > > > View archives and manage your subscription(s) at
> > > > http://discuss.develop.com
> > > >
> > >
> > > ===================================
> > > This list is hosted by DevelopMentor(R) http://www.develop.com
> >
> > >
> > > View archives and manage your subscription(s) at http://discuss.develop.com
> > >
> >
> >
> > --
> > Sébastien
> > www.sebastienlorion.com
> >
> >
> >
> > ===================================
> > This list is hosted by DevelopMentor(R) http://www.develop.com
> >
> > View archives and manage your subscription(s) at http://discuss.develop.com
> >
>
>
>
>
> --
>
> Studying for the Turing test
>
> ===================================
>
> This list is hosted by DevelopMentor(R) http://www.develop.com
>
>
> View archives and manage your subscription(s) at http://discuss.develop.com
>

--
Sébastien
www.sebastienlorion.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Brady Kelly

2008-02-16 07:30:00 UTC

Permalink

> About that, Brady, I never ever got the demand to support a multi-char
> delimiter and I have never saw that in other CSV-like parsers.
>
> Sébastien

Yes, I'm not faulting you, as the need even on my side is limited. One or
two our accounting packages use a double pipe, and if I had not been told to
exclude income data scenarios, I would have used your parser CSV after
substituting a single character for the multi-character delimiters. I will
still, but not for some time.

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Per Bolmstedt

2008-02-15 09:17:48 UTC

Permalink

On Fri, 15 Feb 2008 09:35:39 +0100, Daniel Petersson
<***@CEFALO.SE> wrote:

> in my first post, I clearly stated that the choice (regex vs
> hand-coded) depended on the develment task, sadly a lot people
> didn't notice this and therefore a lot of post are slightly or
> completely of topic.

No, the reason it went OT for me is that the nonchalant statement "object
creation in .NET is expensive" has been used at least twice to support
design strategies, and subsequently gone unchallenged. And because I find
throwaway generalizations like these harmful, I can't find it within myself
to let them pass.

And as elaboration has shown, there is *nothing* to back up this claim; one
poster has rephrased it as "object creation is expensive just like
everything is expensive" (tautology), another as "object creation is
expensive if you do it in a way that leads to expensive consequences"
(non-sequitur), etc.

I've seen senior developers make strict guidelines forbidding certain things
on tautologies and non-sequiturs like these, so I have first-hand experience
of their dangers.

"Thread-OT but not list-OT" is always OK, right?

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Sébastien Lorion

2008-02-15 15:08:03 UTC

Permalink

Just like you cannot let someone saying allocation is expensive, I
cannot let someone saying it should not be taken into account. As
Frans pointed out, object destruction also need to be taken into
account (yes, I consider it as part of the cost). That is one thing
people seem to forget when moving to a GC environment. In C++, the
whole allocation/destruction was more explicit and people were more
careful usually.

Just look at the original topic. The problem is to find surrounding
delimiters and remove them. This can all be done easily with no
allocation. Why would it be necessary to reverse the delimiter quote
by creating a StringBuilder ? I don't want to nitpick on this
particular "outlook compiled" code, but you see my point I hope.

Rereading the posts, I don't see anyone saying that what they say is a
mantra, a general guideline, whatever you want to call it. The
comments were made in the context of a parser API where perf is
primordial. That said, a guideline such as to do as much outside a
loop as possible is common sense and very safe I would say.

Sébastien

On 2/15/08, Per Bolmstedt <***@ul7.info> wrote:
> On Fri, 15 Feb 2008 09:35:39 +0100, Daniel Petersson
> <***@CEFALO.SE> wrote:
>
> > in my first post, I clearly stated that the choice (regex vs
> > hand-coded) depended on the develment task, sadly a lot people
> > didn't notice this and therefore a lot of post are slightly or
> > completely of topic.
>
>
> No, the reason it went OT for me is that the nonchalant statement "object
> creation in .NET is expensive" has been used at least twice to support
> design strategies, and subsequently gone unchallenged. And because I find
> throwaway generalizations like these harmful, I can't find it within myself
> to let them pass.
>
> And as elaboration has shown, there is *nothing* to back up this claim; one
> poster has rephrased it as "object creation is expensive just like
> everything is expensive" (tautology), another as "object creation is
> expensive if you do it in a way that leads to expensive consequences"
> (non-sequitur), etc.
>
> I've seen senior developers make strict guidelines forbidding certain things
> on tautologies and non-sequiturs like these, so I have first-hand experience
> of their dangers.
>
> "Thread-OT but not list-OT" is always OK, right?
>
>
> ===================================
> This list is hosted by DevelopMentor(R) http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com
>

--
Sébastien
www.sebastienlorion.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Brady Kelly

2008-02-15 15:55:27 UTC

Permalink

The problem is not just to remove the delimiter quotes, it is also to
conditionally remove other pieces of the input, so for each input string, if
I were to say currentToken = currentToken.Replace(...), up to three or four
string creations would take place, without a mutable StringBuilder.

...
> Just look at the original topic. The problem is to find surrounding
> delimiters and remove them. This can all be done easily with no
> allocation. Why would it be necessary to reverse the delimiter quote
> by creating a StringBuilder ? I don't want to nitpick on this
> particular "outlook compiled" code, but you see my point I hope.
...

> Sébastien

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Frans Bouma

2008-02-15 17:41:57 UTC

Permalink

> The problem is not just to remove the delimiter quotes, it is also to
> conditionally remove other pieces of the input, so for each input string, if
> I were to say currentToken = currentToken.Replace(...), up to three or four
> string creations would take place, without a mutable StringBuilder.

I'm not sure what you're doing with the texts and why they've to
change, but before you end up with a lot of code which handles 1001 special
cases, it's perhaps a good idea to look into creating a proper parser for your
task so you can define what to convert with a set of rules (grammar) so it's
better maintainable.

Of course, if the set of tasks to be done on the text is small and the
tasks are very straightforward, no problem, otherwise I'd opt for a parser.

FB

>
>
>
> ...
> > Just look at the original topic. The problem is to find surrounding
> > delimiters and remove them. This can all be done easily with no
> > allocation. Why would it be necessary to reverse the delimiter quote
> > by creating a StringBuilder ? I don't want to nitpick on this
> > particular "outlook compiled" code, but you see my point I hope.
> ...
>
> > Sébastien
>
> ===================================
> This list is hosted by DevelopMentor® http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Brady Kelly

2008-02-16 07:24:44 UTC

Permalink

Peter Ritchie

2008-09-11 16:47:17 UTC

Permalink

It's because GridViewRowCollection implements
System.Collections.IEnumerable (for whatever reason). This is essentially
the same as System.Collections.IEnumerable<Object>--which is why
everything is an Object. VB.NET is happy to accept that at compile time,
leaving it to deal with it a runtime.

You need to tell the compiler the information it can't infer, for example:
Dim gridList As IEnumerable(Of GridViewRow) = customerGrid.Rows
Dim customerList = From cust In gridList _
Select cust.Cells(4).Text _
Distinct

Cheers -- Peter

On Thu, 11 Sep 2008 10:46:09 -0500, Booth, Bill
<***@PANAMERICANLIFE.COM> wrote:

>Hi All,
>I am trying to run a Linq query on a GridView Rows Collection that will
>select the distinct contents of one cell. The query below will do the
>job; however, it uses late binding. The variables cust and customerList
>are both object types. I just cannot seem to get around this.
>Any advice will be appreciated.
>Bill
>
>Dim gridList As GridViewRowCollection = CustomerGrid.Rows
>
>
>Dim customerList = From cust In gridList _
> Select cust.Cells(4).Text _
> Distinct
>
>For Each i In customerList
> Debug.WriteLine(i)
>Next
>
>The information in this e-mail is confidential, may be legally privileged
and is intended solely for the addressee. If you have received this e-mail
in error, you are hereby notified that any use, distribution, or copying
of this communication is strictly prohibited.
>
>===================================
>This list is hosted by DevelopMentor® http://www.develop.com
>
>View archives and manage your subscription(s) at
http://discuss.develop.com

===================================
This list is hosted by DevelopMentor® http://www.develop.com

View archives and manage your subscription(s) at http://discuss.develop.com

Booth, Bill

2008-09-11 17:18:10 UTC

Permalink