Python

Berion

Developer
Any Python 3 magician on a forum? I have some script made by ChatGPT :D but I don't know how to fix it. What I wanted to do is skipping first 16 bytes and strip all the next reading chunks by 2352 bytes from 2352 to 2336 bytes only (which means, converting entire disc image). I could do that in bash but somehow this interested me and I cannot found solution myself.

I don't see any sense in first def thou. Interpreter returns: NameError: name 'chunk_size' is not defined
Code:
input_filename = '2352.bin'
output_filename = '2336.bin'

def extract_data(input_filename, output_filename):
   chunk_size = 2352
   extract_size = 2336
   offset = 16

with open(input_filename, 'rb') as input_file, open(output_filename, 'wb') as output_file:
   while True:
     chunk = input_file.read(chunk_size)
     if not chunk:
       break
     if len(chunk) < chunk_size:
       break

extracted_data = chunk[offset:offset+extract_size]
output_file.write(extracted_data)

if __name__ == '__main__':
   extract_data(input_filename, output_filename)
   print(f"Data extracted from {input_filename} and saved to {output_filename}")
 
Clearly, ChatGPT is not the best to autogenerate code (GitHub CoPilot is but you must write some code in order to trigger the autocomplete function).

Analizing the result, extract_data is useless. And the variables chunk and chunk_size are not defined (I know that in Python you can declare variables while setting them, but I personally find that practice horrible, unreadable and unmaintable).

So, first we can declare both variables chunk and chunk_size as follows:

chunk = []
chunk_size = 2352


The with open body is not clear. The most common way to handle files is as follows:

Code:
file = open(input_filename, "rb") # Open the file in read binary mode

# Using while loop to iterate the file data
while True:
   chunk = file.read(chunk_size)
   if not chunk: # This line means chunk is None (equivalent of null in Python)
       break
  # Processing the chunk of binary data
    print(f"Read {len(chunk)} bytes: {chunk}")

file.close() # Close the file when done

Of course, you must add the output file logic in there.

I'm not quiet sure about this line though:

What I wanted to do is skipping first 16 bytes and strip all the next reading chunks by 2352 bytes from 2352 to 2336 bytes only

For skipping the first 16 bytes, you can add this line right after the file opening statement:

Code:
file = open(input_filename, "rb") # Open the file in read binary mode

file.read(16) # Skips the first 16 bytes

# Using while loop to iterate the file data
while True:
    chunk = file.read(chunk_size)
    if not chunk: # This line means chunk is None (equivalent of null in Python)
        break
    # Processing the chunk of binary data
    print(f"Read {len(chunk)} bytes: {chunk}")

file.close() # Close the file when done

But I don't quiet understand what you mean with "strip all the next reading chunks by 2352 bytes from 2352 to 2336 bytes".
 
Last edited:
cant you just use something like e.g.
Code:
INPUT_FILE="2352.bin"
OUTPUT_FILE="2336.bin"
with open(OUTPUT_FILE,"wb") as fn:
    in_file=open(INPUT_FILE,"rb")
    # skip first 16 bytes
    in_file.seek(16)
    # read next 2336 bytes
    chunk = in_file.read(2336)
    fn.write(chunk)
    in_file.close()
 
cant you just use something like e.g.
Code:
INPUT_FILE="2352.bin"
OUTPUT_FILE="2336.bin"
with open(OUTPUT_FILE,"wb") as fn:
    in_file=open(INPUT_FILE,"rb")
    # skip first 16 bytes
    in_file.seek(16)
    # read next 2336 bytes
    chunk = in_file.read(2336)
    fn.write(chunk)
    in_file.close()

If you want to read the 2336 bytes chunk right after the first 16 bytes, yes, your code is correct.
 
this is how I understood the question but not sure if this is correct.
Also, what I dont know is:
is it one larger file (e.g 23520 bytes) which contains mutliple images which needs to be extracted or are there multiple files with same or different size ?
 
The idea is to take input file, skip first 16 bytes, then reading all the rest of file until EOF by 2352 chunks and each of them stripping from 2352 to 2336, and of course merging them in order. This will literally convert Mode 2 Form 2 image from full to CD XA standard which BTW is sufficient for all PSX games, except those which are secured by LibCrypt. Why to convert them in the first place? Because I looking a easy way for extracting XA/STR files to confirm something, and none tools allowing that because they assuming that EDC contains EDC while in reality (but I'm not sure about that), contains user data to fool the RAW TAO readers back in times (RAW DAO 96 wasn't a thing on the market when Sony designing it).

^^

@vetzki It works but only with first chunk, producing 2336B file.
@GuilloteTesla I still getting error:
Code:
NameError: name 'input_filename' is not defined
on line 1.

Maybe I missing some import libs? My Python version is 3.10.12.
 
Last edited:
Sorry @Berion, my bad. Was missing the first line that declares the input filename.

Code:
input_filename = '2352.bin'

file = open(input_filename, "rb") # Open the file in read binary mode

file.read(16) # Skips the first 16 bytes

# Using while loop to iterate the file data
while True:
    chunk = file.read(chunk_size)
    if not chunk: # This line means chunk is None (equivalent of null in Python)
        break
    # Processing the chunk of binary data
    print(f"Read {len(chunk)} bytes: {chunk}")

file.close() # Close the file when done

Regarding the stripping, what you want to get are the first 2336 bytes of each chunk, discarding the remaining 16 bytes, and then reading the next chunk?.
 
Exactly but starting cycle with omitting first and only once 16 bytes. ^^

code from above:
Code:
x@y:/mnt/ramdisk$ python3 test5.py
Traceback (most recent call last):
  File "/mnt/ramdisk/test5.py", line 7, in <module>
  chunk = file.read(chunk_size)
NameError: name 'chunk_size' is not defined
 
something like this should work

Code:
#!/usr/bin/python3

import os, argparse

def create_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument("-i","--input-file",help="specify input file",required=True)
    parser.add_argument("-o","--output-file",help="specify output file base name (default: out)",default="out")
    parser.add_argument("-cs","--chunk_size",help="used chunk size (default: 2352)",default=2352,type=int)
    parser.add_argument("-es","--extract_size",help="used extracted bytes after offset (default: 2336)",default=2336,type=int)
    parser.add_argument("-os","--offset",help="skip n bytes at start of file (default: 16)",default=16,type=int)
    parser.add_argument("--check-size",nargs="?",const=True,default=False)
    return(parser.parse_args())

def check_file_size(input_file: str, chunk_size: int):
    # use this is you want to ensure n number of same sized chunks
    if (os.stat(input_file).st_size % chunk_size != 0):
        raise Exception("size mismatch")

def extract_data(chunk:bytes,offset: int, extract_size: int) -> bytes:
    # strip first 16 bytes
    newdata = chunk[offset:offset+extract_size]
    return newdata

def process_data(input_file: str, out_base_file: str, chunk_size: int, offset: int, extract_size: int):
    with open(input_file,"rb") as in_file:
        pos, file_no = 0, 0
        end = os.stat(input_file).st_size
        while(pos < end):
            chunk = in_file.read(chunk_size)
            out_data = extract_data(chunk,offset,extract_size)
            out_fn = out_base_file+"_"+str(file_no)
            out_file = open(out_fn,"wb")
            out_file.write(out_data)
            out_file.close()
            newpos = in_file.tell()
            print("chunk %i - %i saved to %s" %(pos,newpos,out_fn))
            pos, file_no = newpos, file_no+1

if __name__ == "__main__":
    args = create_parser()
    try:
        if (args.check_size): check_file_size(args.input_file, args.chunk_size);
        process_data(args.input_file,
                     args.output_file,
                     args.chunk_size,
                     args.offset,
                     args.extract_size)
    except Exception as e:
        print("%s" %e)
 
Exactly but starting cycle with omitting first and only once 16 bytes. ^^

code from above:
Code:
x@y:/mnt/ramdisk$ python3 test5.py
Traceback (most recent call last):
  File "/mnt/ramdisk/test5.py", line 7, in <module>
  chunk = file.read(chunk_size)
NameError: name 'chunk_size' is not defined

Crap, another missing line.

Code:
input_filename = '2352.bin'
chunk_size = 2532

file = open(input_filename, "rb") # Open the file in read binary mode

file.read(16) # Skips the first 16 bytes

# Using while loop to iterate the file data
while True:

    chunk = file.read(chunk_size - 16) # Reads only the first 2336 bytes of the chunk
    if not chunk: # This line means chunk is None (equivalent of null in Python)
        break
    # Processing the chunk of binary data
    print(f"Read {len(chunk)} bytes: {chunk}")

    file.read(16) # Skips the remaining 16 bytes of the chunk

file.close() # Close the file when done
 
Thank You both for Your time!

@vetzki Powerful converter has born. ^^ However it doesn't merging chunks.
@GuilloteTesla Thanks, it launched now, displaying on screen:
Code:
python3 test7.py
Read 2532 bytes: b'\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\

(...)

00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
Read 44 bytes: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
But nothing happen to original file (the same checksum have as before), and no new also appears.

- - -

Not sure how to merge data in order in Python, or take this output and stream to new file. But if that is my homework, I will try find a solution. ^^
 
Awesome!.

I realize now that there is a border case to handle here. When the last chunk is read, its size could be lower than the chunk_size (as in your console output, the last list says "Read 44 bytes ...".

Well, I suspect that those last 44 bytes were stripped of 16 bytes and that must not happen (always with the idea of respecting the chunks). But I could be wrong and the rule applies to all the chunks, even the last one.
 
Im stupid. ^^There is not lead-in in disc image (but still such offset is great idea, eg. for VCD conversion case).
Code:
calc 692960352-16
   692960336
calc 692960336/2352
   ~294625.99319727891156462585
calc 692960352/2352
   294626
 
Thank You both for Your time!

@vetzki Powerful converter has born. ^^ However it doesn't merging chunks.

Sorry, didnt understand at first that you want 1 output file (from all the chunks), this shouldn't be difficult to adjust. I can fix the script tomorrow

edit:

this should write the chunks to one file instead of multiple
Code:
#!/usr/bin/python3

import os, argparse

def create_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument("-i","--input-file",help="specify input file",required=True)
    parser.add_argument("-o","--output-file",help="specify output file name (default: out)",default="out")
    parser.add_argument("-cs","--chunk_size",help="used chunk size (default: 2352)",default=2352,type=int)
    parser.add_argument("-es","--extract_size",help="used extracted bytes after offset (default: 2336)",default=2336,type=int)
    parser.add_argument("-os","--offset",help="skip n bytes at start of file (default: 16)",default=16,type=int)
    parser.add_argument("--check-size",nargs="?",const=True,default=False)
    return(parser.parse_args())

def check_file_size(input_file: str, chunk_size: int):
    # use this is you want to ensure n number of same sized chunks
    if (os.stat(input_file).st_size % chunk_size != 0):
        raise Exception("size mismatch")

def extract_data(chunk:bytes,offset: int, extract_size: int) -> bytes:
    # strip first 16 bytes
    newdata = chunk[offset:offset+extract_size]
    return newdata

def process_data(input_file: str, out_fn: str, chunk_size: int, offset: int, extract_size: int):
    with open(input_file,"rb") as in_file:
        counter = 1
        pos = 0
        end = os.stat(input_file).st_size
        out_file = open(out_fn,"ab")
        while(pos < end):
            chunk = in_file.read(chunk_size)
            out_data = extract_data(chunk,offset,extract_size)
            out_file.write(out_data)
            newpos = in_file.tell()
            print("chunk %i - %i (no %i) saved to %s" %(pos,newpos,counter,out_fn))
            pos = newpos
            counter += 1
        out_file.close()

if __name__ == "__main__":
    args = create_parser()
    try:
        if (args.check_size): check_file_size(args.input_file, args.chunk_size);
        # check if output file already exits
        if (os.path.isfile(args.output_file)):
            raise Exception("File (%s) already exists" %(args.output_file))
        process_data(args.input_file,
                     args.output_file,
                     args.chunk_size,
                     args.offset,
                     args.extract_size)
    except Exception as e:
        print("%s" %e)

edit2:
fixed stupidity
 
Last edited:
@vetzki Thank You very much! This is super handy application. Can I please You just one more and last modification?
Would You kindly ^^ add a switch which tell the app to cut data from beginning of chunk or from end of chunk?

In example if chunk is 512 and user want strip it from first 16 bytes, or another example when chunk is 1234 and user want to cut only last 4 from each.

Something like that?
Code:
parser.add_argument("-d","--direction",help="from where cutting bytes (default: from end)",default=0,type=int)
 
Last edited:
this should already be possibe by changing -es / --extract-size

e.g.
Code:
$ ./py_simple_extract_data_chuncks_from_file.py -i some_file -cs 100 -es 96 --offset 0
chunk 0 - 100 (no 1) saved to out
chunk 100 - 200 (no 2) saved to out
chunk 200 - 300 (no 3) saved to out
chunk 300 - 400 (no 4) saved to out
mv@mv-pc:/tmp$ ls -l
insgesamt 16
-rw-r--r-- 1 root root   91 19. Okt 07:49 FAN_lastmode
-rw-r--r-- 1 mv   mv    384 19. Okt 07:59 out
-rwxr-xr-x 1 mv   mv   2583 19. Okt 07:57 py_simple_extract_data_chuncks_from_file.py
-rw-r--r-- 1 mv   mv    400 19. Okt 07:58 some_file
drwx------ 3 root root   60 19. Okt 07:49 systemd-private-2f9ce729bbfb4aa7b6267feb2f2a5446-colord.service-KVPhkg
drwx------ 3 root root   60 19. Okt 07:49 systemd-private-2f9ce729bbfb4aa7b6267feb2f2a5446-ModemManager.service-ie2dsg
drwx------ 3 root root   60 19. Okt 07:49 systemd-private-2f9ce729bbfb4aa7b6267feb2f2a5446-systemd-logind.service-aKyRLi
drwx------ 3 root root   60 19. Okt 07:49 systemd-private-2f9ce729bbfb4aa7b6267feb2f2a5446-systemd-resolved.service-BVaw7g
drwx------ 3 root root   60 19. Okt 07:49 systemd-private-2f9ce729bbfb4aa7b6267feb2f2a5446-systemd-timesyncd.service-DHXmeg
drwx------ 3 root root   60 19. Okt 07:53 systemd-private-2f9ce729bbfb4aa7b6267feb2f2a5446-upower.service-UTCz4g
1.png
2.png
 
@vetzki Hello. I'm sorry for such a late answer.

I'm not sure if I understood You or syntax correctly. In attachment, there are two cases which I believe cannot be achieved both (on blue and green it is what I want keep in output file):

Code:
python3 test.py -i test.bin -o blue.bin -os 16 -cs 16 -es 8
However, it producing 0 byte file. Am I doing something wrong? "os" for skipping only once, first 16 bytes. "cs" is how large chunks are read from a file, right? "es" is how cut from each chunk, right?
chunker_1.png


Code:
?
chunker_2.png
 

Attachments

no, currently this cannnot not be archived, as you want first to skip 16 bytes and then skip 8 bytes

if chunk size is 16 bytes and you skip first 16 bytes (os 16) then there nothing left to extract

to get the green output (minus the first line as this is currently not possible) you would use
Code:
test.py -i test.bin --offset 8 -cs 16 -es 8 -o test_new.bin

last but not least, it would be possible to add something like "--skip-once" or similiar command but not sure if it wouldnt be easier to just dd the file first without first 16 bytes (and the use test.py -i test.bin --offset 8 -cs 16 -es 8 -o test_new.bin)
 
I think I've got it:

Code:
#!/usr/bin/python3

import os, argparse

def create_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument("-i","--input-file", help="specify input file", required=True)
    parser.add_argument("-o","--output-file", help="specify output file name (default: out)", default="out")
    parser.add_argument("-cs","--chunk_size", help="used chunk size (default: 2352)", default=2352, type=int)
    parser.add_argument("-es","--extract_size", help="used extracted bytes after offset (default: 2336)", default=2336, type=int)
    parser.add_argument("-os","--offset", help="skip n bytes at start of file (default: 16)", default=16, type=int)
    parser.add_argument("--check-size", nargs="?", const=True, default=False)
    return(parser.parse_args())

def check_file_size(input_file: str, chunk_size: int):
    # use this is you want to ensure n number of same sized chunks
    if (os.stat(input_file).st_size % chunk_size != 0):
        raise Exception("size mismatch")

def extract_data(chunk: bytes, chunk_size: int, extract_size: int) -> bytes:
    # returns first 'extract_size bytes' from a chunk
    newdata = chunk[chunk_size:chunk_size+extract_size]
    return newdata

def process_data(input_file: str, out_fn: str, chunk_size: int, offset: int, extract_size: int):
    with open(input_file,"rb") as in_file:
        counter = 1
        pos = offset # Skips the first 16 bytes
        end = os.stat(input_file).st_size
        out_file = open(out_fn,"ab")
        while(pos < end):
            chunk = in_file.read(chunk_size)
            out_data = extract_data(chunk, chunk_size, extract_size)
            out_file.write(out_data)
            newpos = in_file.tell()
            print("chunk %i - %i (no %i) saved to %s" %(pos, newpos, counter, out_fn))
            pos = newpos
            counter += 1
        out_file.close()

if __name__ == "__main__":
    args = create_parser()
    try:
        if (args.check_size): check_file_size(args.input_file, args.chunk_size);
        # check if output file already exits
        if (os.path.isfile(args.output_file)):
            raise Exception("File (%s) already exists" %(args.output_file))
        process_data(args.input_file,
                     args.output_file,
                     args.chunk_size,
                     args.offset,
                     args.extract_size)
    except Exception as e:
        print("%s" %e)

My only doubt is with the line newpos = in_file.tell() as I don't know what the tell() function returns after calling the extract_data() function. newpos should be at pos + chunk_size.

 

Similar threads

Back
Top