OT: help needed to debug Perl script

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

OT: help needed to debug Perl script

M. Fioretti-2
Greetings,

A few weeks ago I quickly put together a Perl script to parse big CSV
files, for a project I am working on (I need to do this several times a
day, always with new data). All was fine until yesterday, when the
script started behaving in a consistent, but totally wrong way.

The script runs with "use strict" and -w switch, but I only get a few
warnings for using uninitialized values in certain statements.

The relevant part of the code is this:

    147 my $keycounter = 1;
    148
    149 foreach my $qtq (sort keys %all) {
    150
    151    printf "\nALLCHECK: %6.6s >> %s;\n", $keycounter, $qtq;
    152    $keycounter++;
    153 }
    154
    155 foreach my $qq (sort keys %all) {
    156    $url = $qq;
    157    print "\nADDINGURX: $url;\n";
    158    print "\nADDINGURQ: $qq;\n";

lines 157, 158 and from 147 to 153 are added only for diagnostics. What
happens is that, when I dump  the script output to a file, i.e.:

./myscript.pl > logfile

then:

a) logfile contains 26k+ lines starting with "ALLCHECK" = the %all hash
contains 26k+ keys (

b) the *same* logfile contains:

    ~4700 lines starting with ADDINGURX
    ZERO lines starting with ADDINGURQ

in other words:

the script worked perfectly for weeks. Starting yesterday, the same
script says in line 151 that
the hash has 26k keys, and 5 lines later, that the keys ofthe same hash
are only 4700???

I honestly have no idea of what is happening, or of why it only started
happening now. The input CSV files (which I cannot share, sorry, not my
data...) are different every time, so I initially thought that the last
ones contained some weird character that confuses my code. But if that
were the case, even the first printing statement would only print ~4700
lines.

So, any help is appreciated,

Thanks,
Marco

--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|

Re: OT: help needed to debug Perl script

Ken D'Ambrosio
I'm sorry, but lacking both source data and the full script, this is
much akin to finding the black cat in the coal cellar at midnight that
isn't there.  (And that's not even factoring in the fact that I left
Perl behind some eight years ago when I realized that Ruby did all the
cool things Perl did, but didn't look like line noise when you were
done.  Arguments, of course, can be made for Python, but I really liked
Perl's regex handling, and Ruby pretty much maintains that.)

Since you're unable to share the data, I suggest, instead, getting the
top ~160 lines of a working dataset and a non-working dataset, and try
to see what's going wrong, where.  Perl may not be cool, but I promise
you, it's not suddenly changing its mind on how to handle stuff.  
Something is inconsistent between the datasets.  Pay special attention
for possible unicode intrusion, which can be tricky to detect.

-Ken


On 2018-10-17 01:05, M. Fioretti wrote:

> Greetings,
>
> A few weeks ago I quickly put together a Perl script to parse big CSV
> files, for a project I am working on (I need to do this several times
> a day, always with new data). All was fine until yesterday, when the
> script started behaving in a consistent, but totally wrong way.
>
> The script runs with "use strict" and -w switch, but I only get a few
> warnings for using uninitialized values in certain statements.
>
> The relevant part of the code is this:
>
>    147 my $keycounter = 1;
>    148
>    149 foreach my $qtq (sort keys %all) {
>    150
>    151    printf "\nALLCHECK: %6.6s >> %s;\n", $keycounter, $qtq;
>    152    $keycounter++;
>    153 }
>    154
>    155 foreach my $qq (sort keys %all) {
>    156    $url = $qq;
>    157    print "\nADDINGURX: $url;\n";
>    158    print "\nADDINGURQ: $qq;\n";
>
> lines 157, 158 and from 147 to 153 are added only for diagnostics.
> What happens is that, when I dump  the script output to a file, i.e.:
>
> ./myscript.pl > logfile
>
> then:
>
> a) logfile contains 26k+ lines starting with "ALLCHECK" = the %all
> hash contains 26k+ keys (
>
> b) the *same* logfile contains:
>
>    ~4700 lines starting with ADDINGURX
>    ZERO lines starting with ADDINGURQ
>
> in other words:
>
> the script worked perfectly for weeks. Starting yesterday, the same
> script says in line 151 that
> the hash has 26k keys, and 5 lines later, that the keys ofthe same
> hash are only 4700???
>
> I honestly have no idea of what is happening, or of why it only
> started happening now. The input CSV files (which I cannot share,
> sorry, not my data...) are different every time, so I initially
> thought that the last ones contained some weird character that
> confuses my code. But if that were the case, even the first printing
> statement would only print ~4700 lines.
>
> So, any help is appreciated,
>
> Thanks,
> Marco

--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|

Re: OT: help needed to debug Perl script

M. Fioretti-2
On 2018-10-17 07:18, Ken D'Ambrosio wrote:
> I'm sorry, but lacking both source data and the full script, this is
> much akin to finding the black cat in the coal cellar at midnight that
> isn't there.

Hello Ken, and thanks for the prompt answer.

I am painfully aware that without the full source data things are much
harder than they should be, but cannot do anything about it.

As far as the Perl code goes... I cannot release that either, for the
same reason, but I partially disagree with you there. Probably is my
fault, meaning that I may have not described the problem in the best
way. Let me try like this:

a) Consider a Perl hash with a lots of keys (+25k in this case)

b) What on on Earth may make two **consecutive** and practically
identical statements ("sort and print all the keys of this specific
hash") print +25k lines the first time, and ~15% of that the second
time?

Yes, it is very likely that the last batch of input data contains weird
characters. But why don't they create any problem, until after the first
foreach loop? The first problem here is the *difference** in the outputs
of those two consecutive statements. Isn't what happens before or after
(=the full script code that I cannot share) irrelevant?

Again, thanks for any comment,

Marco



I have two consecutive

> to see what's going wrong, where.  Perl may not be cool, but I promise
> you, it's not suddenly changing its mind on how to handle stuff.
> Something is inconsistent between the datasets.  Pay special attention
> for possible unicode intrusion, which can be tricky to detect.
>
> -Ken
>
>
> On 2018-10-17 01:05, M. Fioretti wrote:
>> Greetings,
>>
>> A few weeks ago I quickly put together a Perl script to parse big CSV
>> files, for a project I am working on (I need to do this several times
>> a day, always with new data). All was fine until yesterday, when the
>> script started behaving in a consistent, but totally wrong way.
>>
>> The script runs with "use strict" and -w switch, but I only get a few
>> warnings for using uninitialized values in certain statements.
>>
>> The relevant part of the code is this:
>>
>>    147 my $keycounter = 1;
>>    148
>>    149 foreach my $qtq (sort keys %all) {
>>    150
>>    151    printf "\nALLCHECK: %6.6s >> %s;\n", $keycounter, $qtq;
>>    152    $keycounter++;
>>    153 }
>>    154
>>    155 foreach my $qq (sort keys %all) {
>>    156    $url = $qq;
>>    157    print "\nADDINGURX: $url;\n";
>>    158    print "\nADDINGURQ: $qq;\n";
>>
>> lines 157, 158 and from 147 to 153 are added only for diagnostics.
>> What happens is that, when I dump  the script output to a file, i.e.:
>>
>> ./myscript.pl > logfile
>>
>> then:
>>
>> a) logfile contains 26k+ lines starting with "ALLCHECK" = the %all
>> hash contains 26k+ keys (
>>
>> b) the *same* logfile contains:
>>
>>    ~4700 lines starting with ADDINGURX
>>    ZERO lines starting with ADDINGURQ
>>
>> in other words:
>>
>> the script worked perfectly for weeks. Starting yesterday, the same
>> script says in line 151 that
>> the hash has 26k keys, and 5 lines later, that the keys ofthe same
>> hash are only 4700???
>>
>> I honestly have no idea of what is happening, or of why it only
>> started happening now. The input CSV files (which I cannot share,
>> sorry, not my data...) are different every time, so I initially
>> thought that the last ones contained some weird character that
>> confuses my code. But if that were the case, even the first printing
>> statement would only print ~4700 lines.
>>
>> So, any help is appreciated,
>>
>> Thanks,
>> Marco

--
http://mfioretti.com

--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|

Re: OT: help needed to debug Perl script

Joel Rees
Can you post just the two lines where the behavior occurs, maybe obfuscating hash names and whatnot?

Are you sure you are not using objects with hidden, and perhaps conflicting methods? That's a common cause of inconsistant behavior.

2018年10月17日(水) 14:50 M. Fioretti <[hidden email]>:
On 2018-10-17 07:18, Ken D'Ambrosio wrote:
> I'm sorry, but lacking both source data and the full script, this is
> much akin to finding the black cat in the coal cellar at midnight that
> isn't there.

Hello Ken, and thanks for the prompt answer.

I am painfully aware that without the full source data things are much
harder than they should be, but cannot do anything about it.

As far as the Perl code goes... I cannot release that either, for the
same reason, but I partially disagree with you there. Probably is my
fault, meaning that I may have not described the problem in the best
way. Let me try like this:

a) Consider a Perl hash with a lots of keys (+25k in this case)

b) What on on Earth may make two **consecutive** and practically
identical statements ("sort and print all the keys of this specific
hash") print +25k lines the first time, and ~15% of that the second
time?

Yes, it is very likely that the last batch of input data contains weird
characters. But why don't they create any problem, until after the first
foreach loop? The first problem here is the *difference** in the outputs
of those two consecutive statements. Isn't what happens before or after
(=the full script code that I cannot share) irrelevant?

Again, thanks for any comment,

Marco



I have two consecutive
> to see what's going wrong, where.  Perl may not be cool, but I promise
> you, it's not suddenly changing its mind on how to handle stuff.
> Something is inconsistent between the datasets.  Pay special attention
> for possible unicode intrusion, which can be tricky to detect.
>
> -Ken
>
>
> On 2018-10-17 01:05, M. Fioretti wrote:
>> Greetings,
>>
>> A few weeks ago I quickly put together a Perl script to parse big CSV
>> files, for a project I am working on (I need to do this several times
>> a day, always with new data). All was fine until yesterday, when the
>> script started behaving in a consistent, but totally wrong way.
>>
>> The script runs with "use strict" and -w switch, but I only get a few
>> warnings for using uninitialized values in certain statements.
>>
>> The relevant part of the code is this:
>>
>>    147       my $keycounter = 1;
>>    148
>>    149       foreach my $qtq (sort keys %all) {
>>    150
>>    151           printf "\nALLCHECK: %6.6s >> %s;\n", $keycounter, $qtq;
>>    152           $keycounter++;
>>    153       }
>>    154
>>    155        foreach my $qq (sort keys %all) {
>>    156           $url = $qq;
>>    157           print "\nADDINGURX: $url;\n";
>>    158           print "\nADDINGURQ: $qq;\n";
>>
>> lines 157, 158 and from 147 to 153 are added only for diagnostics.
>> What happens is that, when I dump  the script output to a file, i.e.:
>>
>> ./myscript.pl > logfile
>>
>> then:
>>
>> a) logfile contains 26k+ lines starting with "ALLCHECK" = the %all
>> hash contains 26k+ keys (
>>
>> b) the *same* logfile contains:
>>
>>    ~4700 lines starting with ADDINGURX
>>    ZERO lines starting with ADDINGURQ
>>
>> in other words:
>>
>> the script worked perfectly for weeks. Starting yesterday, the same
>> script says in line 151 that
>> the hash has 26k keys, and 5 lines later, that the keys ofthe same
>> hash are only 4700???
>>
>> I honestly have no idea of what is happening, or of why it only
>> started happening now. The input CSV files (which I cannot share,
>> sorry, not my data...) are different every time, so I initially
>> thought that the last ones contained some weird character that
>> confuses my code. But if that were the case, even the first printing
>> statement would only print ~4700 lines.
>>
>> So, any help is appreciated,
>>
>> Thanks,
>> Marco

--
http://mfioretti.com

--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users

--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|

Re: OT: help needed to debug Perl script

Colin Law
In reply to this post by M. Fioretti-2
On Wed, 17 Oct 2018 at 06:07, M. Fioretti <[hidden email]> wrote:
> ...
>     157     print "\nADDINGURX: $url;\n";
>     158     print "\nADDINGURQ: $qq;\n";
> ...
>     ~4700 lines starting with ADDINGURX
>     ZERO lines starting with ADDINGURQ

Do you mean that line 157 is printing ok but the output from line 158
never appears?
Are you sure there is not another line there somewhere printing ADDINGURX?

Colin

--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
Reply | Threaded
Open this post in threaded view
|

SOLVED (not completely...): OT: help needed to debug Perl script

M. Fioretti-2
On 2018-10-17 12:06, Colin Law wrote:

> On Wed, 17 Oct 2018 at 06:07, M. Fioretti <[hidden email]>
> wrote:
>> ...
>>     157     print "\nADDINGURX: $url;\n";
>>     158     print "\nADDINGURQ: $qq;\n";
>> ...
>>     ~4700 lines starting with ADDINGURX
>>     ZERO lines starting with ADDINGURQ
>
> Do you mean that line 157 is printing ok but the output from line 158
> never appears?
> Are you sure there is not another line there somewhere printing
> ADDINGURX?

Answering (indirectly) also to Joel:


the snippet of script that I posted is the part of the actual output of

#> cat -n myscript

So this code, from my original message:

    147 my $keycounter = 1;
    148
    149 foreach my $qtq (sort keys %all) {
    150
    151    printf "\nALLCHECK: %6.6s >> %s;\n", $keycounter, $qtq;
    152    $keycounter++;
    153 }
    154
    155 foreach my $qq (sort keys %all) {
    156    $url = $qq;
    157    print "\nADDINGURX: $url;\n";
    158    print "\nADDINGURQ: $qq;\n";

is lines 147 to 158 of the complete script, and consequently yes, I was
sure that there was no other Perl code at all playing tricks here.

What I have been trying to say, maybe badly, is:

a) the above is part of the actual code
b) I run the script dumping the output to a file, for further
processing:

    #> myscript > datadump

c) and I get different numbers of lines from the three statements
(again,
    what follows is ACTUAL output of grep at the shell prompt):

#> grep -c ^ALLCHECK datadump (=line 151 prints 26080 keys from the
hash)
26080
#> grep -c ^ADDINGURX datadump (=line 157 prints only 4732 keys from the
hash)
473
#> grep -c ^ADDINGURQ datadump (=line 158 prints only 4732 keys from the
hash)
473

now the "solution":

After looking at the whole flow from scratch, I found out that the
problem
seems to be 100% *outside* that specific Perl script, and somehow even
more
confusing (for me at least). But that deserves a different thread,
coming
in a few minutes.

Thanks!!!

Marco

--
http://mfioretti.com


--
ubuntu-users mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users